Synchronous first-in/first-out block memory for a field programmable gate array

ABSTRACT

The present invention comprises a field programmable gate array that has a plurality of dedicated first-in/first-out memory logic components. The field programmable gate array includes a plurality of synchronous random access memory blocks that are coupled to a plurality of dedicated first-in/first-out memory logic components and a plurality of random access memory clusters that are programmably coupled to the plurality of dedicated first-in/first-out memory logic components and to the plurality of synchronous random access memory blocks.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. patent application Ser. No. 11/737,030, filed Apr. 18, 2007, which is a continuation of U.S. patent application Ser. No. 11/297,088, filed Dec. 7, 2005, now issued as U.S. Pat. No. 7,227,380, which is a continuation of U.S. patent application Ser. No. 10/948,010, filed Sep. 22, 2004, now U.S. Pat. No. 6,980,027, which is a continuation of U.S. patent application Ser. No. 10/448,259, filed May 28, 2003, now U.S. Pat. No. 6,838,902, issued Jan. 4, 2005, which are hereby incorporated by reference as if set forth herein.

BACKGROUND OF THE SYSTEM

1. Field of the System

The present system relates to field programmable gate array (FPGA) devices. More specifically, the system relates to a synchronous first in/first out memory module for an FPGA.

2. Background

FPGAs are known in the art. An FPGA comprises any number of logic modules, an interconnect routing architecture and programmable elements that may be programmed to selectively interconnect the logic modules to one another and to define the functions of the logic modules. To implement a particular circuit function, the circuit is mapped into the array and the appropriate programmable elements are programmed to implement the necessary wiring connections that form the user circuit.

An FPGA core tile may be employed as a stand-alone FPGA, repeated in a rectangular array of core tiles, or included with other functions in a system-on-a-chip (SOC). The core FPGA tile may include an array of logic modules, and input/output modules. An FPGA circuit may also include other components such as static random access memory (SRAM) blocks. Horizontal and vertical routing channels provide interconnections between the various components within an FPGA core tile. Programmable connections are provided by programmable elements between the routing resources.

An FPGA circuit can be programmed to implement virtually any set of digital functions. Input signals are processed by the programmed circuit to produce the desired set of outputs. Such inputs flow from the user's system, through input buffers and through the circuit, and finally back out to the user's system via output buffers. The bonding pad, input buffer and output buffer combination is referred to as an input/output port (I/O). Such buffers provide any or all of the following input/output (I/O) functions: voltage gain, current gain, level translation, delay, signal isolation or hysteresis.

As stated above, many FPGA designers incorporate blocks of SRAM into their architecture. In some applications, the SRAM blocks are configured to function as a first-in/first-out (FIFO) memory. A FIFO is basically a SRAM memory with automatic read and write address generation and some additional control logic. The logic needed to implement a FIFO, in addition to the SRAM blocks, consists of address generating logic and flag generating logic.

Counters are used for address generation. Two separate counters are used in this application for independent read and write operations. By definition, a counter circuit produces a deterministic sequence of unique states. The sequence of states generated by a counter is circular such that after the last state has been reached the sequence repeats starting at the first state. The circular characteristic of a counter is utilized to generate the SRAM's write and read addresses so that data is sequenced as the first data written to the SRAM is the first data read. The size of the sequence produced by the counters is matched to the SRAM address space size. Assuming no read operation, when the write counter sequence has reached the last count, the SRAM has data written to all its addresses. Without additional control logic, further write operations would overwrite existing data starting at the first address.

Additional logic is needed to control the circular sequence of the read and write address counters in order to implement a FIFO. The control logic enables and disables the counters when appropriate and generates status flags. The read and write counters are initialized to produce a common start location. The control logic inhibits reading at any location until a write operation has been performed. When the write counter pulls ahead of the read counter by the entire length of the address space, the SRAM has data written to all its addresses. The control logic inhibits overwriting an address until its data has been read. Once the data has been read, the control permits overwriting at that address. When the read counter catches up to the write counter, the SRAM no longer contains valid data and the control logic inhibits reading until a write operation is performed.

Output signals, known to those of ordinary skill in the art as flags, provide the system with status on the SRAM capacity available. The full and empty conditions are indicated through full and empty flags. Two additional flags are generated to warn of approaching empty or full conditions.

FPGAs have programmable logic to implement this control logic. With the availability of a SRAM block, an FPGA application may be configured to operate as a FIFO memory. Many prior art FPGAs use this approach. However, considerable FPGA gates are consumed when implementing the control logic for a FIFO in this manner and this increases the cost of the application. Also, the performance of the FIFO is likely to be limited by the speed of the control logic and not the SRAM.

Hence, there is a need for an FPGA that has dedicated logic specifically included to implement a FIFO. The FIFO logic may included among the SRAM components in an FPGA core tile. The result is improved performance and a decrease in silicon area needed to implement the functions with respect to implementing the FIFO-function with FPGA gates.

SUMMARY OF THE SYSTEM

A field programmable gate array having a plurality of random access memory blocks coupled to a plurality of dedicated first-in/first-out memory logic components and a plurality of random access memory clusters programmably coupled to the rest of the FPGA is described.

A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description of the invention and accompanying drawings which set forth an illustrative embodiment in which the principles of the invention are utilized.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a one-tile FPGA of the present system.

FIG. 2 is a block diagram of an FPGA including multiple core tiles 102 as shown in FIG. 1.

FIG. 3 is a simplified block diagram of a synchronous random access memory (SRAM) module of the present system.

FIG. 4 is a simplified schematic diagram illustrating the FIFO logic component of the present system.

FIG. 5 is a simplified block diagram illustrating the architecture of a RAM cluster of the present system.

FIG. 6 is a simplified schematic diagram illustrating RT module, RN module, RI module and RO module of a RAM cluster of FIG. 5.

FIG. 7 is a simplified schematic diagram illustrating RC module of a ram cluster of FIG. 5.

DETAILED DESCRIPTION OF THE INVENTION

Those of ordinary skill in the art will realize that the following description of the present invention is illustrative only and not in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons.

In the present disclosure, Vcc is used to define the positive power supply for the digital circuit as designed. As one of ordinary skill in the art will readily recognize, the size of a digital circuit may vary greatly depending on a user's particular circuit requirements. Thus, Vcc may change depending on the size of the circuit elements used.

Moreover, in this disclosure, various circuits and logical functions are described. It is to be understood that designations such as “1” and or “0” in these descriptions are arbitrary logical designations. In a first implementation of the invention, or “1” may correspond to a voltage high, while “0” corresponds to a voltage low or ground, while in a second implementation, “0” may correspond to a voltage high, while “1” corresponds to a voltage low or ground. Likewise, where signals are described, a “signal” as used in this disclosure may represent the application, or pulling “high” of a voltage to a node in a circuit where there was low or no voltage before, or it may represent the termination, or the bringing “low” of a voltage to the node, depending on the particular implementation of the invention.

FIG. 1 is a block diagram of an illustrative core tile 102 in an FPGA 100 of the present system. FPGA core tile 102 comprises an array of logic clusters 104, static random access memory (SRAM) clusters 106 and static random access memory (SRAM) modules 108. Logic clusters 104 and SRAM clusters 106 are connected together by a routing interconnect architecture (not shown) that may comprise multiple levels of routing interconnects. FPGA core tile 102 is surrounded by input/output (I/O) clusters 110, input/output (I/O) FIFO control blocks 114 and input/output banks 112. There are two rows of I/O clusters 110 on the top and bottom edges of FPGA 100 and one column of I/O clusters on the left and right edge of FPGA 100. In the present example, for illustrative purposes only, there are seven SRAM clusters 106 adjacent to and interacting with each SRAM module 108.

FIG. 2 is a block diagram of an illustrative FPGA including multiple core tiles 102 as shown as an example in FIG. 1. As shown in FIG. 2, FPGA 120 comprises four core tiles 102, though other numbers of tiles are possible. Core tiles 102 are surrounded by I/O clusters 110, input/output FIFO control blocks 114 and I/O banks 112.

FIG. 3 is a simplified block diagram of a static random access memory (SRAM) block 108 of the present system. The present system combines dedicated control logic with a two port SRAM to produce a FIFO. As set forth in FIGS. 1 and 2, there are four SRAM blocks 108 along the one side of FPGA tile 102. Each SRAM block 108 may be configured to operate as an individual SRAM module or modules may be cascaded together to produce wider or deeper memory combinations. As set forth in greater detail below, dedicated FIFO control logic has been added to each SRAM block.

Referring still to FIG. 3, SRAM block 108 comprises a SRAM component 150. SRAM component 150 is a memory component. Memory components are well known to those of ordinary skill in the relevant art and can vary greatly depending on the application. Write data bus 152 and write address bus 156 are coupled to SRAM component 150 through register 154. Write data enable signal lines 158 are each coupled to SRAM component 150 through one input of two-input XOR gates 160, 162, 164, 166, 168, AND gate 170 and register 154. The second input of two-input XNOR gates 160, 162, 164, 166, 168 is provided by write enable control lines 172. Register 154 receives a clock signal through write clock signal line 159. Read address bus 174 is coupled to SRAM component through register 176. Read enable signal lines 178 are each coupled to SRAM component 150 through one input of XOR gates 180, 182, 184, 186, 188, AND gate 190 and register 176. The second input of register 176 receives a clock signal through read clock signal line 192. Input signal busses 194 and 196 provide the signals for determining the write word width and read word width respectively. Read data bus 198 is coupled to the output of SRAM component 150 through register 199 and two-input multiplexer 197.

In the present example, for illustrative purposes only, SRAM block 108 has multiple bits accessible by two independent ports: a read only port (all circuitry on the right of SRAM block 108) and a write only port (all circuitry on the left of SRAM block 108). Both ports may be independently configured in multiple words by bits per words combinations. For example, both ports may be configured as 4,096×1, 2,048×2, 1,024×4, 512×9, 256×18 and 128×36. In addition, a plurality of SRAM blocks may be cascaded together by means of busses 152, 156, 158, 174, 178, 198. In the present example, there are five enable lines for each port, one for real enable and four for higher order address bits. The ten XOR gates are used to invert or not invert the lines on a block-by-block basis effectively making AND gates 170 and 190 decoders with programmable bubbles on the inputs. The write port is synchronous to the write clock and the read port is synchronous to the read clock. As one of ordinary skill in the art would readily recognize, the above example is illustrative only, many other configurations or memory blocks could be used.

FIG. 4 is a simplified schematic diagram illustrating the FIFO logic component 200 of the present invention. FIFO logic component 200 is coupled between static random access memory (SRAM) clusters 106 and static random access memory (SRAM) block 108. In the present example, for illustrative purposes only, FIFO logic component 200 is coupled between seven static random access memory (SRAM) clusters 106 and static random access memory (SRAM) block 108. Two input AND gate 202 has its non-inverting and inverting inputs coupled to random access memory cluster 106 via signal lines 240 and 242 respectively and an output coupled to address comparator 238, and to subtractor circuit 222 through counter 210 and to address comparator 232 through registers 218 and 220. The output of two-input AND gate 202 may also be coupled to RAM module 108 through tri-state buffer 206. The output of counter 210 may also be coupled to SRAM block 108 through tri-state buffer 214. Two input AND gate 204 has its non-inverting and inverting inputs coupled to SRAM cluster 106 through signal lines 244 and 246 respectively and its output coupled to address comparator 232 through counter 212. Two input AND gate 204 also has its output coupled to address comparator 238 through counter 212, register 224 and register 226 and its output is also coupled to subtractor 222 through counter 212. The output of two-input AND gate 204 may also be coupled to SRAM module 108 through tri-state buffer 208. The output of counter 212 may also be coupled to SRAM block 108 through tri-state buffer 216. Buffers 206, 208, 214 and 216 receive their control signals from SRAM clusters 106 programmable configuration bits 248.

Referring still to FIG. 4, subtractor circuit 222 has its output coupled to one input of magnitude comparators 234 and 236. Magnitude comparators 234 and 236 receive their second input from the programmable configuration bits 228 and 230 respectively. The configuration bits in 228 and 230 are programmable threshold values need to generate the almost full and almost empty flags respectively.

Read data bus 250 and write data bus 252 are coupled directly to SRAM block 108. When the FIFO logic component is not active, controller bits 248 are set at 0 disabling the tri-state buffers 206, 208 214 and 216. When the SRAM is not configured as a FIFO, all input signals originate from adjacent SRAM clusters 106. When a SRAM is configured as a FIFO, a select set of signals from the RAM cluster modules are set to high impedance and FIFO logic component 200 seizes control of the signal lines. When FIFO logic component 200 is active, it seizes control of the write enable signals 158, the read enable signals 178 and the read and write address lines 174 and 156 respectively as shown in FIG. 3.

Counters 210 and 212 are binary counters, however, they also generate gray code. Gray code or “single distance code” is an ordering of 2^(n) binary numbers such that only one bit changes between any two consecutive elements. The binary value is sent to subtractor 222 to calculate the difference between the read and write counters for the almost full and almost empty flags. The gray code is sent to address comparators 232 and 238 as well as to tri-state buffers 214 and 216. In gray code, one and only one bit changes between any two consecutive codes in the sequence. The purpose of registers 218 and 220 is to synchronize the read counter address in 210 to write clock signal and the purpose of registers 224 and 226 is to synchronize the write counter address to read clock signal for comparison purposes. Because there is no requirement that read clock signal 253 and write clock signal be synchronous, there is no guarantee that the outputs of 210 will not be changing during the setup and hold time windows of register 218. Because of the likelihood of change during the register setup and hold time window, there is a chance of an uncertain result. The chance of an uncertain result is limited by using gray code to make sure that only one bit can change at a time. However the uncertainty on that one bit resolves itself, the result is that the bit will either get the last address or the next address and no other address when comparing the read and write addresses.

When the memory is full writing must be inhibited to prevent overwriting valid data in the SRAM. To control this the comparison between the read and write addresses is done in the write clock (WCK) time domain since write operations are synchronous to WCK. The read address counter 210 gray code sampled two WCK cycles in the past by registers 218 and 220 is compared to the current write address counter 212 gray code by comparator 232. If the result is equal, then the SRAM may be full and writing is inhibited. There is no way to reliably know for certain if the SRAM is really full. The read address being compared is two WCK cycles old and one or more read operations may have occurred during that time. However, by erring on the side of safety when it is possible that the memory might be full, overwriting of data can be reliably prevented.

In a similar manner, when the memory is empty reading must be inhibited to prevent outputting invalid data from the SRAM. To control this the comparison between the write and read addresses is done in the RCK time domain since read operations are synchronous to RCK. The write address counter 212 gray code sampled two RCK cycles in the past by registers 224 and 226 is compared to the current read address counter 210 gray code by comparator 238. If the result is equal, then the SRAM may be empty and reading is inhibited. There is no way to reliably know for certain if the SRAM is really full. The write address being compared is two RCK cycles old and one or more read operations may have occurred during that time. However, by erring on the side of safety when it is possible that the memory might be empty, reading of invalid data can be reliably inhibited.

Since both a full and an empty condition are detected by equality between the read and write addresses, a way to tell the difference between the two conditions is require. This is accomplished by having an extra most significant bit (MSB) in counters 210 and 212 which is not part of the address space sent to the SRAM block (and not shown in FIG. 4 to avoid overcomplicating the disclosure and obscuring the invention). Additional logic (also not shown) inside each comparator 232 and 238 compares the read and write MSBs. When the two MSBs are equal and the read and write addresses are equal in comparator 238, this implies a possible empty condition. When the two MSBs are not equal and the read and write addresses are equal in comparator 232, this implies a possible full condition.

FIG. 5 is a simplified block diagram illustrating the architecture of a RAM cluster 106 of the present system. As would be clear to those of ordinary skill in the art having the benefit of this disclosure, RAM cluster 106 may comprise any number of the logic components as indicated below. The examples set forth below are for illustrative purposes only and in no way limit the scope of the present invention. Random access memory clusters 106(0-6) further comprise two sub-clusters 300 and 302. Each sub cluster 300 and 302 has two transmitter modules 314 and two receiver modules 312. Right sub cluster 302 has a buffer module 316.

To avoid overcomplicating the disclosure and thereby obscuring the present invention, receiver modules 312, transmitter modules 314 and buffer module 316 are not described in detail herein. The implementation of receiver modules 312 and transmitter modules 314 suitable for use according to the present system is disclosed in co-pending U.S. patent application Ser. No. 10/323,613, filed on Dec. 18, 2002, and hereby incorporated herein by reference. The implementation of buffer modules 316 suitable for use according to the present system is disclosed in U.S. Pat. No. 6,727,726, issued Apr. 27, 2004, and hereby incorporated herein by reference.

In the present example, for illustrative purposes only, the interface to each SRAM block 108 is logically one RAM cluster 106 wide and seven rows long. Thus, there is a column of seven RAM clusters 106(0) through 106(6) for every SRAM block 108. Sub-clusters 300 and 302 of RAM cluster 106(0) each have one RAM clock interface input (RC) module 304, six single ended input (RT) modules 306 and two RAM interface output (RO) modules 308 in addition to the two transmitter modules 314 and two receiver modules 312 as set forth above. Right sub cluster 302 also has a buffer module 316. RC modules 304 in RAM cluster 106(0) select the write and read clock signals from all the HCLK and RCLK networks or from signals in either of two adjacent two routed channels and determine their polarity. RC modules 304 will be discussed in greater detail below. Each RT module 306 provides a control signal to SRAM module 108 which is either routed from a single channel or tied off to logic 1 or logic 0. RO modules 308 transmit read-data or FIFO flags from SRAM module 108 into an individual output track. RT modules 306 and RO modules 308 will be discussed in greater detail below.

Sub-clusters 300 and 302 of RAM clusters 106(1-6) each have three two-input RAM channel-up/channel-down non-cascadable signal (RN) modules 310, three RO modules 308 and six two-input RAM channel-up/channel-down cascadable signal (RI) modules 309 in addition to the two transmitter modules 314 and two receiver modules 312 as set forth above. Right sub cluster 302 also has a buffer module 316. RN modules 310 and RI modules 309 provide an input signal to SRAM module 108 that can be routed from two rows, the one in which it is located and the row immediately above it.

FIG. 6 is a simplified schematic diagram illustrating RT module 306, RN module 310, RI module 309 and RO module 308 of a RAM cluster of FIG. 5. RT module 306 comprises a buffer 354 that has an input programmably coupled to a horizontal routing track in routing architecture row 352. As is known to those of ordinary skill in the art, there are types of programmable elements. Illustrative examples of such programmable elements include, but are not limited to, MOS transistors, flash memory cell and antifuses. Buffer 354 has an output that is coupled to SRAM block 108.

RN module 310 comprises a two-input AND gate 356 and a buffer 358. One input of two-input AND gate 356 is programmably coupled to a horizontal routing track in routing architecture row 350. The second input of two-input AND gate 356 is programmably coupled to a horizontal routing track in routing architecture row 352. The output of two-input AND gate 356 is coupled to SRAM module 108 through buffer 358.

RI module 309 comprises a two-input NAND gate 376 having the ability to select a signal from routing architecture row 150 or 152. Two-input NAND gate 376 has an output coupled to SRAM block 108 through tri-state buffer 380 and one inverted signal input of a two-input OR gate 378. Two-input OR gate has a second input coupled to Vcc or ground and its output coupled to SRAM module 108 through tri-state buffer 380. In the present disclosure, Vcc is used to define the positive power supply for the digital circuit as designed. As one of ordinary skill in the art will readily recognize, the size of a digital circuit may vary greatly depending on a user's particular circuit requirements. Thus, Vcc may change depending on the size of the circuit elements used.

In this disclosure, various circuits and logical functions are described. It is to be understood that designations such as “1” and “0” in these descriptions are arbitrary logical designations. In a first implementation of the invention, “1” may correspond to a voltage high, while “0” corresponds to a voltage low or ground, while in a second implementation, “0” may correspond to a voltage high, while or “1” corresponds to a voltage low or ground. Likewise, where signals are described, a “signal” as used in this disclosure may represent the application, or pulling “high” of a voltage to a node in a circuit where there was low or no voltage before, or it may represent the termination, or the bringing “low” of a voltage to the node, depending on the particular implementation of the invention.

RO module 308 comprises a buffer 360 having an input coupled to FIFO control block 200 or SRAM block 108. The output of buffer 360 requires programming voltage protection and drives an output track which in routing architecture row 352.

FIG. 7 is a simplified schematic diagram illustrating RC module 304 of a ram cluster of FIG. 5. RC module 304 comprises a four input multiplexer 362 having inputs coupled to the clock network bus 370 (not shown). Multiplexer 362 has an output coupled to a first input of a two-input multiplexer 365. The second input of two-input multiplexer 365 is selectively programmably coupled to the routing architecture in rows 372 and 374 through two-input AND gate 364. Two-input multiplexer 365 has an output coupled to an input of a two-input XNOR gate that has a second input programmably coupled to Vcc or ground in routing architecture row 372. The output of XNOR gate 366 is coupled to SRAM block 108 through buffer 368.

While embodiments and applications of this system have been shown and described, it would be apparent to those skilled in the art that many more modifications than mentioned above are possible without departing from the inventive concepts herein. The system, therefore, is not to be restricted except in the spirit of the appended claims. 

1. A field programmable gate array comprising: a plurality of random access memory clusters that store data; a plurality of static random access memory blocks that control transmission of data to and from said plurality of random access memory clusters wherein each of said plurality of random access memory clusters are coupled to one said plurality of static random access memory blocks; and a plurality of dedicated first-in/first-out memory logic components for providing first-in/first out access to said data stored in said plurality of random access memory clusters wherein each of said plurality of plurality of dedicated first-in/first-out memory logic components is coupled to one of said plurality of static random access memory blocks and to each of said plurality of random access memory clusters coupled to said one of said plurality of static random access memory blocks. 